Review fixes for #715: drop scope creep, dead code, redundant test; dedup row-major LDE by MauroToscano · Pull Request #741 · yetanotherco/lambda_vm

MauroToscano · 2026-06-29T20:45:37Z

Review fixes for #715 (row-major GPU LDE). Targets the PR branch so they fold into #715. Net cleanup, −~150 lines.

Changes

Revert unrelated FxHashMap op-dedup change (scope creep). The FxHasher/FxHashMap micro-optimization in types.rs + branch/dvrm/lt/mul.rs is unrelated to the LDE rework and was only applied to 4 of 6 dedup tables (eq.rs/bytewise.rs left on std HashMap). Reverted to std::collections::HashMap; it can land as its own focused PR.
Remove redundant gpu_and_cpu_proofs_both_verify test. The GPU full path is covered by the normal prove/verify suite built with --features cuda (plus gpu_path_fires_end_to_end), the CPU path by the non-cuda suite, and GPU/CPU equivalence by the merkle_root_parity / barycentric_cpu_gpu_parity tests. Its "force CPU" leg also never ran on CPU: gpu_lde_threshold() only re-read the env var under #[cfg(test)], but from the prover integration-test crate stark compiles without cfg(test), so the OnceLock cached the first value and the second set_var was a no-op. Also simplified gpu_lde_threshold() to a single cached impl now that the per-call re-read has no consumer.
Fix stale docs / dead code. Moved keccak256_leaves_base_row_major out of keccak_merkle_level's doc block (it had split the "child pair → one parent" comment away from its kernel). Deleted columns_to_row_major in prover.rs — zero callers after the row-major path stopped materializing GPU-expanded columns.
Guard row-major keccak. Added debug_assert!(num_rows >= 2) to launch_keccak_base_row_major; the kernel computes __brevll(tid) >> (64 - log_num_rows), which is UB at num_rows == 1. Matches the existing guard in launch_keccak_base.
Consolidate the row-major LDE pipeline. Extracted coset_lde_row_major_inner, shared by the base and ext3 _keep entry points (they differed only by m vs m * 3 and the handle type), removing ~110 lines of drift-prone duplication.
Fix stale assertion in gpu_path_fires_end_to_end. It asserted gpu_parts_lde_calls() > 0 with a comment claiming branch/shift tables are degree-3 — both false: fib_iterative_1M tables all have number_of_parts <= 2, and the common degree-2 case fires the fused two-halves path (gpu_extend_halves_calls), counted separately from the parts > 2 path since perf(stark): fuse composition half-extension onto coset_lde_full (precomputed twiddles) #700. Now asserts on the sum so either composition-LDE path satisfies it.

Verification

GPU-validated on RTX 5090 / CUDA 13.1:

make test-math-cuda → 78/78 passed, including merkle_root_parity (base + ext3) and barycentric_cpu_gpu_parity — confirms the coset_lde_row_major_inner consolidation is behavior-preserving.
make test-cuda-integration → green (gpu_path_fires_end_to_end: GPU dispatches fire, proof verifies).
make bench-prover-cuda → prove(fib_iterative_1M) = 5.89s, GPU LDE path fired.

Host checks: cargo check + cargo clippy -p lambda-vm-prover clean; all edited Rust files rustfmt-clean; the keccak.cu change is a pure reorder (added lines == removed lines, no logic change).

The FxHasher/FxHashMap op-dedup micro-optimization is unrelated to the row-major GPU LDE rework and was only applied to 4 of 6 dedup tables. Revert the table maps to std HashMap and drop the hasher; it can land as its own focused PR.

The GPU full path is covered by the normal prove/verify suite built with --features cuda (plus gpu_path_fires_end_to_end), the CPU path by the non-cuda suite, and GPU/CPU equivalence by the merkle/barycentric parity tests. Its force-CPU leg also never ran on CPU: gpu_lde_threshold() only re-read the env var under cfg(test), but from the prover integration crate stark compiles without cfg(test), so the OnceLock cached the first value. Simplify gpu_lde_threshold() to a single cached impl now that the per-call re-read has no consumer.

keccak.cu: move keccak256_leaves_base_row_major out of keccak_merkle_level's doc block so the child-pair->parent doc rejoins its kernel. prover.rs: delete columns_to_row_major, which has no callers after the row-major GPU path stopped materializing GPU-expanded columns.

Extract coset_lde_row_major_inner shared by the base and ext3 _keep entry points (they differed only by m vs m*3 and the handle type), removing ~110 lines of drift-prone duplication. Add debug_assert!(num_rows >= 2) to launch_keccak_base_row_major: the kernel shifts by (64 - log_num_rows), UB at num_rows==1, matching the guard in launch_keccak_base.

The assert checked gpu_parts_lde_calls() > 0 with a comment claiming branch/shift tables are degree-3 — both false: fib_iterative_1M tables all have number_of_parts <= 2, and the common degree-2 case fires the fused two-halves path (gpu_extend_halves_calls), counted separately from the parts>2 path (gpu_parts_lde_calls) since #700. Assert on the sum so either composition-LDE path satisfies it. Validated on RTX 5090 / CUDA 13.1: make test-math-cuda 78/78, make test-cuda-integration green, proof verifies.

MauroToscano added 5 commits June 29, 2026 17:45

jotabulacios approved these changes Jun 29, 2026

View reviewed changes

jotabulacios merged commit 4bb4798 into perf/cpu-lde-rework_gpu Jun 29, 2026
11 checks passed

jotabulacios deleted the pr-715-fixes branch June 29, 2026 21:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Review fixes for #715: drop scope creep, dead code, redundant test; dedup row-major LDE#741

Review fixes for #715: drop scope creep, dead code, redundant test; dedup row-major LDE#741
jotabulacios merged 5 commits into
perf/cpu-lde-rework_gpufrom
pr-715-fixes

MauroToscano commented Jun 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

MauroToscano commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MauroToscano commented Jun 29, 2026 •

edited

Loading